Goto

Collaborating Authors

 simulation scenario


All Emulators are Wrong, Many are Useful, and Some are More Useful Than Others: A Reproducible Comparison of Computer Model Surrogates

Rumsey, Kellin N., Gibson, Graham C., Francom, Devin, Morris, Reid

arXiv.org Machine Learning

Accurate and efficient surrogate modeling is essential for modern computational science, and there are a staggering number of emulation methods to choose from. With new methods being developed all the time, comparing the relative strengths and weaknesses of different methods remains a challenge due to inconsistent benchmarking practices and (sometimes) limited reproducibility and transparency. In this work, we present a large-scale, fully reproducible comparison of $29$ distinct emulators across $60$ canonical test functions and $40$ real emulation datasets. To facilitate rigorous, apples-to-apples comparisons, we introduce the R package \texttt{duqling}, which streamlines reproducible simulation studies using a consistent, simple syntax, and automatic internal scaling of inputs. This framework allows researchers to compare emulators in a unified environment and makes it possible to replicate or extend previous studies with minimal effort, even across different publications. Our results provide detailed empirical insight into the strengths and weaknesses of state-of-the-art emulators and offer guidance for both method developers and practitioners selecting a surrogate for new data. We discuss best practices for emulator comparison and highlight how \texttt{duqling} can accelerate research in emulator design and application.


Conceptualizing and Modeling Communication-Based Cyberattacks on Automated Vehicles

Li, Tianyi, Liu, Tianyu, Yang, Yicheng

arXiv.org Artificial Intelligence

Adaptive Cruise Control (ACC) is rapidly proliferating across electric vehicles (EVs) and internal combustion engine (ICE) vehicles, enhancing traffic flow while simultaneously expanding the attack surface for communication-based cyberattacks. Because the two powertrains translate control inputs into motion differently, their cyber-resilience remains unquantified. Therefore, we formalize six novel message-level attack vectors and implement them in a ring-road simulation that systematically varies the ACC market penetration rates (MPRs) and the spatial pattern of compromised vehicles. A three-tier risk taxonomy converts disturbance metrics into actionable defense priorities for practitioners. Across all simulation scenarios, EV platoons exhibit lower velocity standard deviation, reduced spacing oscillations, and faster post-attack recovery compared to ICE counterparts, revealing an inherent stability advantage. These findings clarify how controller-to-powertrain coupling influences vulnerability and offer quantitative guidance for the detection and mitigation of attacks in mixed automated traffic.


From Dashcam Videos to Driving Simulations: Stress Testing Automated Vehicles against Rare Events

Miao, Yan, Fainekos, Georgios, Hoxha, Bardh, Okamoto, Hideki, Prokhorov, Danil, Mitra, Sayan

arXiv.org Artificial Intelligence

Testing Automated Driving Systems (ADS) in simulation with realistic driving scenarios is important for verifying their performance. However, converting real-world driving videos into simulation scenarios is a significant challenge due to the complexity of interpreting high-dimensional video data and the time-consuming nature of precise manual scenario reconstruction. In this work, we propose a novel framework that automates the conversion of real-world car crash videos into detailed simulation scenarios for ADS testing. Our approach leverages prompt-engineered Video Language Models(VLM) to transform dashcam footage into SCENIC scripts, which define the environment and driving behaviors in the CARLA simulator, enabling the generation of realistic simulation scenarios. Importantly, rather than solely aiming for one-to-one scenario reconstruction, our framework focuses on capturing the essential driving behaviors from the original video while offering flexibility in parameters such as weather or road conditions to facilitate search-based testing. Additionally, we introduce a similarity metric that helps iteratively refine the generated scenario through feedback by comparing key features of driving behaviors between the real and simulated videos. Our preliminary results demonstrate substantial time efficiency, finishing the real-to-sim conversion in minutes with full automation and no human intervention, while maintaining high fidelity to the original driving events.


ChatSUMO: Large Language Model for Automating Traffic Scenario Generation in Simulation of Urban MObility

Li, Shuyang, Azfar, Talha, Ke, Ruimin

arXiv.org Artificial Intelligence

Large Language Models (LLMs), capable of handling multi-modal input and outputs such as text, voice, images, and video, are transforming the way we process information. Beyond just generating textual responses to prompts, they can integrate with different software platforms to offer comprehensive solutions across diverse applications. In this paper, we present ChatSUMO, a LLM-based agent that integrates language processing skills to generate abstract and real-world simulation scenarios in the widely-used traffic simulator - Simulation of Urban MObility (SUMO). Our methodology begins by leveraging the LLM for user input which converts to relevant keywords needed to run python scripts. These scripts are designed to convert specified regions into coordinates, fetch data from OpenStreetMap, transform it into a road network, and subsequently run SUMO simulations with the designated traffic conditions. The outputs of the simulations are then interpreted by the LLM resulting in informative comparisons and summaries. Users can continue the interaction and generate a variety of customized scenarios without prior traffic simulation expertise. For simulation generation, we created a real-world simulation for the city of Albany with an accuracy of 96\%. ChatSUMO also realizes the customizing of edge edit, traffic light optimization, and vehicle edit by users effectively.


SimBench: A Rule-Based Multi-Turn Interaction Benchmark for Evaluating an LLM's Ability to Generate Digital Twins

Wang, Jingquan, Zhang, Harry, Unjhawala, Huzaifa Mustafa, Negrut, Peter, Wang, Shu, Slaton, Khailanii, Serban, Radu, Wu, Jin-Long, Negrut, Dan

arXiv.org Artificial Intelligence

We introduce SimBench, a benchmark designed to evaluate the proficiency of student large language models (S-LLMs) in generating digital twins (DTs) that can be used in simulators for virtual testing. Given a collection of S-LLMs, this benchmark enables the ranking of the S-LLMs based on their ability to produce high-quality DTs. We demonstrate this by comparing over 20 open- and closed-source S-LLMs. Using multi-turn interactions, SimBench employs a rule-based judge LLM (J-LLM) that leverages both predefined rules and human-in-the-loop guidance to assign scores for the DTs generated by the S-LLM, thus providing a consistent and expert-inspired evaluation protocol. The J-LLM is specific to a simulator, and herein the proposed benchmarking approach is demonstrated in conjunction with the Chrono multi-physics simulator. Chrono provided the backdrop used to assess an S-LLM in relation to the latter's ability to create digital twins for multibody dynamics, finite element analysis, vehicle dynamics, robotic dynamics, and sensor simulations. The proposed benchmarking principle is broadly applicable and enables the assessment of an S-LLM's ability to generate digital twins for other simulation packages. All code and data are available at https://github.com/uwsbel/SimBench.


Hidden or Inferred: Fair Learning-To-Rank with Unknown Demographics

Olulana, Oluseun, Cachel, Kathleen, Murai, Fabricio, Rundensteiner, Elke

arXiv.org Artificial Intelligence

As learning-to-rank models are increasingly deployed for decision-making in areas with profound life implications, the FairML community has been developing fair learning-to-rank (LTR) models. These models rely on the availability of sensitive demographic features such as race or sex. However, in practice, regulatory obstacles and privacy concerns protect this data from collection and use. As a result, practitioners may either need to promote fairness despite the absence of these features or turn to demographic inference tools to attempt to infer them. Given that these tools are fallible, this paper aims to further understand how errors in demographic inference impact the fairness performance of popular fair LTR strategies. In which cases would it be better to keep such demographic attributes hidden from models versus infer them? We examine a spectrum of fair LTR strategies ranging from fair LTR with and without demographic features hidden versus inferred to fairness-unaware LTR followed by fair re-ranking. We conduct a controlled empirical investigation modeling different levels of inference errors by systematically perturbing the inferred sensitive attribute. We also perform three case studies with real-world datasets and popular open-source inference methods. Our findings reveal that as inference noise grows, LTR-based methods that incorporate fairness considerations into the learning process may increase bias. In contrast, fair re-ranking strategies are more robust to inference errors. All source code, data, and experimental artifacts of our experimental study are available here: https://github.com/sewen007/hoiltr.git


Automated Generation of High-Quality Medical Simulation Scenarios Through Integration of Semi-Structured Data and Large Language Models

Sumpter, Scott

arXiv.org Artificial Intelligence

This study introduces a transformative framework for medical education by integrating semi-structured data with Large Language Models (LLMs), primarily OpenAIs ChatGPT3.5, to automate the creation of medical simulation scenarios. Traditionally, developing these scenarios was a time-intensive process with limited flexibility to meet diverse educational needs. The proposed approach utilizes AI to efficiently generate detailed, clinically relevant scenarios that are tailored to specific educational objectives. This innovation has significantly reduced the time and resources required for scenario development, allowing for a broader variety of simulations. Preliminary feedback from educators and learners has shown enhanced engagement and improved knowledge acquisition, confirming the effectiveness of this AI-enhanced methodology in simulation-based learning. The integration of structured data with LLMs not only streamlines the creation process but also offers a scalable, dynamic solution that could revolutionize medical training, highlighting the critical role of AI in advancing educational outcomes and patient care standards.


Robotics as a Simulation Educational Tool

Karagounis, Athanasios

arXiv.org Artificial Intelligence

In the evolving landscape of education, robotics has emerged as a powerful tool for fostering creativity, critical thinking, and problem-solving skills among students of all ages. This innovative approach to learning seamlessly integrates STEM (Science, Technology, Engineering, and Mathematics) concepts, creating an engaging and immersive learning experience. Educational robotics transcends traditional classroom settings, transforming learning into a hands-on, experiential endeavor. Students are actively involved in the design, construction, and programming of robots, allowing them to apply theoretical concepts to practical applications. This hands-on approach fosters deeper understanding and retention of knowledge, making learning more meaningful and enjoyable. In this paper, the potential of simulation robotics is evaluated as a hands on interactive learning experience that goes beyond traditional robotic classroom methods.


Exploration of the Assessment for AVP Algorithm Training in Underground Parking Garages Simulation Scenario

Li, Wenjin

arXiv.org Artificial Intelligence

Abdullah's study, as described in [1], compared the space Simulation test scenarios are an important part of helping utilization efficiency of diagonal, parallel, and perpendicular autonomous driving algorithms improve, but current simulation parking methods. The research findings concluded that scenarios are still limited to manual approaches. The perpendicular parking methods yield the highest number of ultimate goal of this project is to generate Autonomous parking spaces. This conclusion was drawn using a university Valet Parking (AVP) simulation test scenarios in underground as a specific example. The study summarized in [2] focuses garages with differentiated difficulty levels through reinforcement on smart parking solutions, emphasizing their significance learning, which will challenge the vehicle-side AVP in the context of urban growth and traffic congestion. This algorithms and ultimately improve the algorithmic test metrics.


Robust Principal Component Analysis using Density Power Divergence

Roy, Subhrajyoty, Basu, Ayanendranath, Ghosh, Abhik

arXiv.org Machine Learning

Principal component analysis (PCA) is a widely employed statistical tool used primarily for dimensionality reduction. However, it is known to be adversely affected by the presence of outlying observations in the sample, which is quite common. Robust PCA methods using M-estimators have theoretical benefits, but their robustness drop substantially for high dimensional data. On the other end of the spectrum, robust PCA algorithms solving principal component pursuit or similar optimization problems have high breakdown, but lack theoretical richness and demand high computational power compared to the M-estimators. We introduce a novel robust PCA estimator based on the minimum density power divergence estimator. This combines the theoretical strength of the M-estimators and the minimum divergence estimators with a high breakdown guarantee regardless of data dimension. We present a computationally efficient algorithm for this estimate. Our theoretical findings are supported by extensive simulations and comparisons with existing robust PCA methods. We also showcase the proposed algorithm's applicability on two benchmark datasets and a credit card transactions dataset for fraud detection.